The self-assembly of DNA Holliday junctions studied with a minimal model 
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In this paper, we explore the feasibility of using coarse-grained models to simulate the self-assembly 
of DNA nanostructures. We introduce a simple model of DNA where each nucleotide is represented 
by two interaction sites corresponding to the sugar-phosphate backbone and the base. Using this 
model, we are able to simulate the self-assembly of both DNA duplexes and Holliday junctions from 
single-stranded DNA. We find that assembly is most successful in the temperature window below 
the melting temperatures of the target structure and above the melting temperature of misbonded 
aggregates. Furthermore, in the case of the Holliday junction, we show how a hierarchical assembly 
mechanism reduces the possibility of becoming trapped in misbonded configurations. The model 
is also able to reproduce the relative melting temperatures of different structures accurately, and 
allows strand displacement to occur. 

PACS numbers: 87.14.gk,81.16.Dn,87.15.ak 



I. INTRODUCTION 

The ability to design nanostructures which accurately 
self-assemble from simple units is central to the goal 
of engineering objects and machines on the nanoscale. 
Without self-assembly, structures must be laboriously 
constructed in a step by step fashion. Double-stranded 
DNA (dsDNA) has the ideal properties for a nanoscale 
building block, ^'^ with structural length scales deter- 
mined by the separation of base pairs, the helical 
pitch and its persistence length (approximately 0.33 nm, 
3.4 nm (Ref. 3) and 50 nm,'' respectively). Over these 
distances, dsDNA acts as an almost rigid rod and so it is 
capable of forming well-defined three dimensional struc- 
tures. 

It is the selectivity of base pairing between single 
strands, however, that makes DNA ideal for controlled 
self-assembly. By designing sections of different strands 
to be complementary, a certain configuration of a system 
of oligonucleotides can be specified as the global mini- 
mum of the energy landscape. In this way the target 
structure (usually consisting of branched double helices) 
can be 'programmed' into the sequences. This approach 
was initially demonstrated for a four-armed junction by 
the Seeman group in 1983.^ Such junctions and more 
rigid double crossover motifs^ can then be used to cre- 
ate two-dimensional lattices. ^'^ Yan et al.^ have also con- 
structed ribbons and two dimensional lattices from larger 
four-armed structures, each arm consisting of a junction 
of four strands. Furthermore, using Rothemund's DNA 
"origami" approach an almost arbitrary variety of two- 
dimensional shapes can be created. '^^ 

Progress in forming three-dimensional DNA nanos- 
tructures was initially much slower. The Seeman goup 
managed to synthesize a DNA cube^"' and a truncated 
octahedron, ^^ but only after a long series of steps and 



with a low final yield. More recently, approaches have 
been developed that allow polyhedral cages, such as 
tetrahedra,^'^ trigonal bipyramids,^^ octahedra,'^^-'^^ do- 
decahedra and truncated icosahedra,'^^ to been obtained 
in high yields simply by cooling solutions of appropri- 
ately designed oligonucleotides from high temperature. 
Additional structures have also been produced using pre- 
assembled modular building blocks incorporating other 
organic molecules. '^^''^^ 

In designing strand sequences, it is important to min- 
imize the stability of competing structures with respect 
to the stability of the target configuration. In addition, if 
systems can be designed to follow certain routes through 
configuration space — for example, by the hierarchical as- 
sembly of simple motifs^" — the target can potentially be 
reached more efficiently. A standard approach to hier- 
archical assembly, such as that described by He et al.,^'' 
involves choosing sequences so that bonds between dif- 
ferent pairs of oligonucleotides become stable at different 
temperatures. This allows certain motifs to form in iso- 
lation at high temperatures before bonding to each other 
as the solution is cooled. An alternative, elegant system 
for programming assembly pathways has been proposed 
by Yin et al?^ , which relies on the metastability of single 
stranded loop structures and the possibility of catalyzing 
their interactions using other oligonucleotides. 

Given these recent experimental advances in creating 
DNA nanostructures, it would be useful to have theo- 
retical models that allow further insights into the self- 
assembly process. In particular, a successful model would 
be able to provide information on the formation path- 
ways and free energy landscape associated with the self- 
assembly, and as such would be of use to experimentalists 
wishing to consider increasingly more complex designs. 
Atomistic simulations of DNA would offer potentially the 
most spatially-detailed descriptions of the self-assembly. 



However, they are computationally very expensive, and 
are generally restricted to time scales that are too short 
to study self-assembly.^^ 

Statistical approaches such as that of Poland and 
Scheraga^'^ and the nearest-neighbour model^^ use sim- 
ple expressions for the free energy of helix and random 
coil states to obtain equilibrium results for the bonding 
of two strands. Whilst the parameters in these models 
can be tuned to give very accurate correspondence with 
experimental data,^^'^^ they give no information on the 
dynamics and formation pathways and hence are only 
useful for ensuring that the target structure has signifi- 
cantly lower free energy than competing configurations. 
Furthermore, any description purely based on secondary 
structure (i.e. which bases are paired) is inherently inca- 
pable of accounting for topological effects such as linking 
of looped structures. ^^ 

Coarse-grained or minimal models offer a compromise 
between detail and computational simplicity, and are well 
suited to the study of hybridization of oligonucleotides. 
The aim of these models is to be capable of describing 
both the thermodynamic and kinetic behaviour of sys- 
tems, a vital feature if kinetic metastability is inherent in 
assembly pathways. ^"'^ In developing such minimal mod- 
els the approach is usually to retain just those physical 
features of the system that are essential to the behaviour 
that is of interest. 

Dauxois, Peyrard and Bishop models,^* and modified 
versions such as that proposed by Buyukdagli, Sanrey, 
and Joyeux,^^ constitute the simplest class of dynamical 
models. Although these are dynamic models in the sense 
that the energy is a function of the separation between 
each base, the nucleotides are constrained to move in one 
dimension. This lack of conformational freedom means 
that these models are incapable of capturing the nuances 
of the self-assembly from single-stranded DNA (ssDNA). 

Recently, models have been proposed which capture 
the helicity of dsDNA using two'^" or three^^ interac- 
tion sites per nucleotide. These models, however, are 
optimized for studying deviations from the ideal double- 
stranded state, and so have not been used to examine 
self-assembly. Although they have been used to study 
the thermal denaturation of dsDNA, it is essential for our 
purposes to be able to simulate the assembly of a struc- 
ture from ssDNA as it is this process that will reveal the 
kinetic traps and free energy landscape associated with 
the formation of a particular DNA nanostructure. 

Simpler, linear models, also with two interaction sites 
per nucleotide, have been used to investigate duplex 
hybridization, ■^■^ hairpin formation"^'^ and gelation of col- 
loids functionalized with oligonucleotides.'^'* These mod- 
els all use two interaction sites to represent one nu- 
cleotide, with backbone sites linked to each other to rep- 
resent the sugar-phosphate chain, and interaction sites 
which represent the bases. This work investigates the 
possibility of extending the use of such coarse-grained 
models to study the self-assembly of nanostructures that 
involve multiple strands forming branched duplexes. We 





FIG. 1: (Colour online) A schematic representation of the 
model. The thick lines represent the rigid backbone monomer 
units and the large circles the repulsive Lennard- Jones inter- 
actions at their centres. The smaller, darker circles represent 
the bases. The panels illustrate the definitions of (a) the bend- 
ing angle between two units (6), and (b) the torsional angle 
(<jf>) which is found after the monomers have been rotated to 
lie parallel. 



hypothesize that the self-assembly properties of DNA are 
dominated by the fact that ssDNA is a semi-flexible poly- 
mer with selective attractive interactions. We introduce 
an extremely simple model, similar to that of Starr and 
Sciortino, to test this hypothesis. ■^^ This simplicity en- 
ables us to explore the thermodynamics and kinetics of 
self-assembly in the model in great depth, and hence ex- 
amine the feasibility of simulating nanostructure forma- 
tion through minimal models. 

We first describe the model in Section II, then exam- 
ine its success in reproducing the general features of hy- 
bridization in Section III A. Next in Section IIIB, we 
apply it to the formation of a Holliday junction, a simple 
nanostructure consisting of a four-armed cross.* 



II. METHODS 

A. Model 

We introduce an off-lattice model inspired by that 
which Starr and Sciortino used to study the gelation of 
four-armed DNA dendrimers.^^ As our aim is to repro- 
duce the basic physics with as simple a model as possible, 
we neglect contributions to the interactions due to base 
stacking, and the charge and asymmetry of the phosphate 
backbone. We do not attempt to include the detailed ge- 
ometrical structure of DNA, but instead represent the 
oligonucleotides as a chain of monomer units, each corre- 
sponding to one nucleotide (Fig. 1). A monomer consists 
of a rod (chosen to be rigid for simplicity) of length I with 
a repulsive backbone interaction site at the centre of the 
rod. In addition, each unit has a bonding interaction 
site (or base) at a distance of 0.3 Z from the backbone 



site (perpendicular to the rod). Each monomer is also 
assigned a base type (A,G,C,T) to model the selective 
nature of bonding. In this model we only consider bonds 
between the complementary pairs A-T and G-C. 

We do not explicitly include any solvent molecules in 
our simulations, but instead use effective potentials to 
describe the interactions between the DNA. Sites inter- 
act through shifted-force Lennard- Jones (LJ) potentials, 
where, as well as truncating and shifting the potential, an 
extra term is included to ensure the force goes smoothly 
to zero at the cutoff Tc ■ For r < r^ 



V,i{r) = Vu{r) - T/L,T(r,) - (r - r,) 



dVi 



LJ 



dr 



where 



Vi,T(r)=4e 



(1) 



(2) 



and Vs[{r) = for r > Tc. Backbone sites (except adja- 
cent units on the same strand) interact through Eq. (1) 
with a = I and re = 2^'^a. This purely repulsive interac- 
tion models the steric repulsion between strands. Bond- 
ing sites (again excluding adjacent units on a strand) in- 
teract via Eq. (1) with a = 0.35 1 and fc = 2.5 a for com- 
plementary bases (to allow for attraction) and r^ — 2^/^cr 
for all other pairings. The depth of the resulting poten- 
tial well between complementary bases, e^fg;,; i^ 0.396e. 
In what follows we will measure the temperature in terms 
of a reduced temperature, T* = ksT/e^^^. The above 
choice of parameters ensures that the attractive inter- 
action between complementary bases is largely shielded 
by backbone repulsion. Monomers therefore bond selec- 
tively and can only bond strongly to one other monomer 
at a given instant. These are the key features of Watson- 
Crick base pairing that make DNA so useful for self- 
assembly. 

The model also includes potentials between consecu- 
tive monomers associated with bending and twisting the 
strand: 



Vb 



end 



fci(l-cos(6')) 
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if0<f, 
otherwise 



and 



Vtwist = hil -cos((/))). 



(3) 



(4) 



We define 6 as the angle between the vectors along ad- 
jacent monomer rods. As previously mentioned, consec- 
utive backbone sites do not interact via LJ potentials. 
Instead, a hard cutoff is introduced in Eq. (3) to reflect 
the fact that an oligonucleotide cannot double back on 
itself. <j> is taken as the angle between adjacent backbone 
to bonding site vectors after the monomers have been ro- 
tated to lie parallel (Fig. 1). For simplicity, we choose the 
torsional potential to have a minimum at (f) — 0. Thus, 
neither ssDNA or dsDNA will be helical in our model. 



fci is chosen to be 0.1 e to give a persistence length, 
Ips = 3.149 Z at a reduced temperature of T* = 0.09677 
for ssDNA. We obtain this result by simulating a single 
strand 70 bases in length, and using the definition:"^^ 



^ps 



(L-li) 



(5) 



where L is the end to end vector of the strand, li is the 
vector associated with the first monomer and () indicates 
a thermal average. Taking / to be 6.3A, T* == 0.09677 is 
mapped to 24° C and so the model is consistent with ex- 
perimental data for ssDNA in 0.445M NaCl solution. ^^■^'^ 
fc2 is chosen to be 0.4 e. 

In neglecting the geometrical structure of a double he- 
lix we do not accurately represent certain types of bond- 
ing. 'Bulged' bonding occurs when consecutive bases in 
one of the strands attach to non-consecutive bases in the 
other strand. 'Internal loops' consist of stretches of non- 
complementary bases (either symmetric or asymmetric in 
the number of bases involved in each strand). 'Hairpins' 
result when a single strand doubles back and bonds to 
itself. The details of these motifs are complicated but an 
empirical description of their thermodynamic properties 
is given in Ref. 26. Importantly, they are generally pe- 
nalized due to the disruption of the geometry of DNA in 
a way which is not well reproduced by our model. In the 
case of the short strands we consider, these motifs will 
only play a small role as the strands are not specifically 
intended to have stable structures of these forms. In fact, 
as the base sequences we use were designed to form the 
Holliday junction, the possibility of forming these motifs 
at relevant temperatures was deliberately avoided.'^* 

For simplicity we therefore include only two alter- 
ations to the model. Firstly, we define 'kinked states' 
as those for which the number of unpaired bases be- 
tween two bonding pairs on either side of a duplex is 
not equal (including asymmetric loops and bulges). We 
impose an infinite energy penalty on the formation of 
these kinked states if the total number of intermediate 
bases is less than six. Secondly, we treat complementary 
units within six bases of each other on the same strand as 
non-complementary, but allow all other hairpins without 
penalty. 

It should be also noted that this model neglects the 
directional asymmetry of the sugar-phosphate backbone. 
Therefore, parallel as well as anti-parallel bonding is pos- 
sible in our model, whereas parallel bonding does not 
occur in experiment. 



B. Monte Carlo Simulation 

In a fully-atomistic model of DNA the natural way to 
simulate its dynamics would be to use molecular dynam- 
ics. However, the best way to simulate the dynamics in 
a coarse-grained model is an important, but not fully re- 
solved, question, and one that will depend on the nature 
of the model. Clearly, for the current model standard 



molecular dynamics is inappropriate as it will lead to bal- 
listic motion of the strands between collisions because of 
the absence of explicit solvent particles, whereas DNA in 
solution undergoes diffusive Brownian motion. An alter- 
native approach is to use Metropolis Monte Carlo (MC) 
algorithm'^^''*'^ where the moves are restricted to be lo- 
cal, as it has been argued that this can provide a rea- 
sonable approximation to the dynamics. ^^""^^ This is the 
approach that we use here to simulate the dynamics of 
self-assembly of DNA duplexes and Holliday junctions. 
In particular, the local MC moves that we use are trans- 
lation and rotation of whole strands and bending of a 
strand about a particular monomer, thus ensuring that 
the strands undergo an approximation to diffusive Brow- 
nian motion in the simulations. Therefore we expect the 
MC simulations, which are all initiated with free single 
strands, to mimic the real self-assembly processes in our 
model. It is important to note that this will include, as 
well as successful assembly into the target structure, ki- 
netic trapping in non-equilibrium configurations and that 
when the latter occurs this reflects the inefficiency of the 
self-assembly under those conditions. 

Although a true measure of time is impossible in 
Monte Carlo simulations, an approximate time scale for 
diffusion-limited processes can be found by comparing 
the diffusive properties of objects to experiment. By mea- 
suring the diffusion of isolated strands, and assuming dif- 
fusion coefficients comparable to those of double strands 
and hairpin loops of similar length, ^^ we conclude that 
one step per strand corresponds to a time scale of ap- 
proximately 2 ps. Thus, our model allows our systems to 
be studied on millisecond time scales. 

At the end of the above MC simulations, our systems 
will not necessarily have reached equilibrium, both be- 
cause the energy barriers to escape from misbonded con- 
figurations can be difficult to overcome at low tempera- 
ture and the low rate of association at higher tempera- 
tures. Therefore, as a comparison we also compute the 
equilibrium thermodynamic properties of our systems us- 
ing umbrella sampling.**^' "^ Formally, we can write the 
thermal average of a function B{r^) in the canonical en- 
semble as: 



(B) 



I 



^ [WiQ) exp(-T//kB r)]drN 



wTq) 



(6) 



where Q — Q{r^) is an order parameter or reaction co- 
ordinate and V — V{r^) is the potential energy. We are 
free to choose W{Q), and by taking the term in square 
brackets as the weighting of states and keeping statistics 
for B/W and 1/W at each step we can find (B). In stan- 
dard MetropoHs MC, W — 1, but by choosing W{Q) in 
such a way that those states with intermediate values of 
Q are visited more frequently, the effective free energy 
barrier between (meta)stable states can be lowered al- 
lowing the system to pass easily between the free energy 
minima, and equilibrium to be reached. 

To ensure that each value of Q is equally likely to be 
sampled in an umbrella sampling simulation, one would 



choose W{Q) = exTp{(3A{Q)), where A{Q) is the free en- 
ergy as a function of the order parameter. To achieve 
this, however, would require knowledge of A{Q). In- 
stead, there are standard methods to construct W{Q) 
iteratively, but for the current examples it was possible 
to construct W{Q) manually, because of the relative sim- 
plicity of the free energy profiles. 

To a first approximation the interaction between fully 
bonded structures is negligible. Therefore, in the um- 
brella sampling simulations we consider systems contain- 
ing the minimum number of strands required to form a 
given object (two for a duplex and four for a Holliday 
junction). We then use the relative weight of bound and 
free states to extrapolate the expected fractional concen- 
trations for larger systems.^® The natural choice for the 
order parameter Q is the number of correct bonds, where 
two monomers are defined to be bonded if their energy 
of interaction is negative. 



III. RESULTS 

A. Duplex Formation 

We test the model by analysing the duplex bond- 
ing of two different complementary strands. We simu- 
late systems of ten oligonucleotides, initially not bonded, 
in a periodic cell with a concentration of 5.49 x 10^'"' 
molecules 1^^ (or 3.65 x 10^"' M). We separately consider 
strands consisting of 7 and 13 monomers, which corre- 
spond to two of the arms of the Holliday junction studied 
experimentally by Malo et al.^''^^ and which we consider 
in Section IIIB: 



7 bases 



G-A-G-T-T-A-G 
C-T-A-A-C-T-C 



13 bases 



G-C-G-A-T-G-A-G-C-A-G-G-A 
T-C-C-T-G-C-T-C-A-T-C-G-C 



(7) 



(8) 



where we have listed strands in the 5 '-3' sense for consis- 
tency with the literature. The yields of correctly-bonded 
and misbonded structures at the end of the simulations 
are depicted in Figure 2 as a function of temperature. 
We also display the predicted equilibrium fraction of cor- 
rectly bonded strands for both systems obtained using 
umbrella sampling. For convenience, we define a cor- 
rectly bonded structure to have more than 70% of the 
bonds of the complete duplex and no bonds to other 
strands. Any other structure is recorded as 'misbonded'. 
Figure 2 shows a maximum in the yield as a function of 
temperature. Such behaviour is typical of self-assembling 



systems 



47-51 



and reflects the thermodynamic and dy- 



namic constraints on the self-assembly process. Firstly, 
the yield is zero at high temperature where only ssDNA 
is stable, and rises just below the expected equilibrium 
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FIG. 2: (Colour online) Yields of correctly-formed duplexes 
and misbonded configurations at the end of our MC sim- 
ulations (lines with data points, as labelled) compared to 
the equilibrium probability of the strands adopting the cor- 
rect structure as obtained by umbrella sampling. The solid 
and dashed lines represent results for strands with 7 and 13 
monomers, respectively. The MC results are averages over 



ten runs of length 3 x 10 
in the simulation cell. 



steps per strand with ten strands 



value as the temperature is decreased, the deviation aris- 
ing due to the large number of steps required to reach 
equihbrium. At low temperatures, the yield falls away 
due to the presence of kinetic traps which are now stable 
with respect to isolated strands, as evidenced by the rise 
in 'misbonded structures' in Figure 2. Thus, there is a 
non-monotonic dependency of yield on temperature and 
an optimum region for successful assembly, which corre- 
sponds to the region where only the desired structure is 
stable against thermal fluctuations. Figure 3 is a snap- 
shot from near the end of a simulation in this regime. It 
should be noted that neglecting helicity has the effect of 
increasing the flexibility of dsDNA in the direction per- 
pendicular to the plane of bonding (a helix cannot bend 
in any direction without disturbing its internal structure 
whereas a 'ladder' can). 

The heat capacity obtained from umbrella sampling of 
pair formation is shown in Figure 4(a). The heat ca- 
pacity peaks indicate a transition from single strands to 
a duplex. As the formation of duplexes is essentially a 
chemical equilibrium between monomers and clusters of 
a definite size (in this case two) the width of the peaks 
will remain finite as the number of strands is increased. 
The transition does, however, become increasingly nar- 
row as the DNA strands become longer, as is evident 
from comparing the heat capacity peaks for the 7-mer 
and 13-mers. 

Figures 4(b) shows the free energy profile, F{Q), for 
the formation of a duplex. The initial peak at low Q is 
accounted for by the entropic cost of bringing two strands 
together. Once bonds are formed, however, adding extra 
bonds costs much less entropy whilst providing a signif- 






FIG. 3: (Colour online) Snapshot of a fully-assembled con- 
figuration in a MC simulation of ten 13-base strands at 
T* = 0.0971. In this image the colour of the backbone in- 
dicates the type of strand: red for G-C-G-A-T-G-A-G-C-A- 
G-G-A and grey for its complement. Backbone sites are in- 
dicated by the large spheres, and bases by the small, blue 
spheres. 



icant decrease in energy, explaining the monotonic de- 
crease in F{Q) beyond Q = 2. The rise between Q = 1 
and Q = 2 is partly due to the fact that in order to 
form two bonds between strands the relative orientation 
of strands must be specified whereas this is not true for 
Q = I: hence there is an additional entropy penalty to 
the formation of the second bond. In addition, there ex- 
ist structures with only one correct bond that are stabi- 
lized by additional incorrect bonds and these misbonded 
configurations also contribute to F(l). The constant gra- 
dient above Q = 2 indicates that the energetic gain and 
entropic cost of forming an extra bond are approximately 
constant at a given temperature, which is consistent with 
the assumptions underlying nearest-neighbour models of 
DNA melting. 24 

In Figure 5, we compare our melting curves to those 
predicted by a simple two-state model, ^^ using the same 
mapping of the reduced temperature as in Section II A. 
In the two-state model the molar concentrations of prod- 
uct (AB) and reactants (A,B) are given by the equilib- 
rium relation: 



[AB] 

[ahbI 



exp 



-AHq + TASq 
keT 



(9) 



where AHq and ASq are assumed to be constants which 
depend only on the strand sequences and the salt concen- 
tration (we take [Na+] = 0.445M as in Section II A). We 
use the enthalpy and entropy changes of duplex forma- 
tion calculated by "HyTher" ,^'^ a program that estimates 
these values using the "unified oligonucleotide nearest 
neighbour parameters" .^^'^^ The authors claim that the 
thermodynamic parameters predicted by HyTher give 
the melting temperature Tm (the temperature at which 
the fraction of bonded strands is 1/2) of a duplex to 
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FIG. 4: Thermodynamics for the formation of a single du- 
plex, (a) Heat capacity curves for 7- and 13-base systems, 
as labelled, (b) Free energy profile associated with the for- 
mation of a 13 base pair duplex at different temperatures (as 
labelled) , where Q represents the number of correctly formed 
bonds in the duplex. 



within a standard error of ±2.2°C.^'^ Figure 5 shows that 
our system reflects the melting temperatures predicted by 
the two-state model with reasonable accuracy, excepting 
sequence dependent effects which are not included in our 
model, because the interaction energies between A-T and 
C-G complementary base pairs have for simplicity been 
taken to be the same. The widths of transitions are seen 
to be of the same order, but slightly larger for our model. 
This feature, which is typical of coarse-grained models, '^^ 
indicates that the degree of entropy loss on hybridiza- 
tion is too small in our model, and is due to a failure to 
accurately incorporate all degrees of freedom which be- 
come frozen on hybridization. However, the agreement is 
sufficiently good that the basic features of physical DNA 
assembly should be reproducible. 

A further satisfying feature of the model is that 'dis- 
placement' was observed on several occasions. This pro- 
cess, during which a misbonded pair of strands is broken 
up by a third strand, is illustrated in Figure 6. The third 
strand is able to bond to the pair, as some bases are free 
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FIG. 5: (Colour online) The bulk equilibrium probability of 
strands being in a correct duplex extrapolated from our um- 
brella sampling simulations (solid lines) compared to the pre- 
dictions of an empirical two-state model (dashed lines). Re- 
sults are presented for strands with 7 and 13 bases, as la- 
belled. For the two-state model, as well as the results for the 
sequences in Eqs. (7) and (8) (lines (a) and (d)), sequences 
corresponding to the other two arms of the HoUiday junction 
(Figure 7) are considered. Only one line is shown for the um- 
brella sampling results, because A-T and G-C have the same 
binding energy in our model. Temperatures in our model are 
converted to °C using the same mapping given in Section II 
A. 



in the misbonded structure. Thermal fluctuations allow 
the new strand to bond to sites previously involved in 
misbonding, in a process known as 'branch migration'. 
Eventually one of the misbonded strands is completely 
displaced, leaving a correct duplex and an isolated single 
strand. This behaviour is observed in real DNA systems, 
and is the driving mechanism of some nanomachines^"* 
and DNA catalyzed reactions. ^^'^^ 



B. Holliday Junction 

Encouraged by the above results, we next apply the 
model to the formation of a Holliday junction. Holli- 
day junctions consist of four single strands which bind to 
form a four-armed cross. In our case we consider a Hol- 
liday junction with two long arms (13 bases long) and 
two short arms (7 bases long). We use the experimental 
base ordering of Malo et al.^^ with the 'sticky ends' re- 
moved. (These sticky ends consist of six unpaired bases 
on the end of arms and their purpose is to allow the Hol- 
liday junctions to bond together to form a lattice). The 
sequences of the four DNA strands and schematic dia- 
grams of the possible junctions that they can form are 
shown in Figure 7. 

Initially we studied a system of 20 strands (five of each 
type) that has the potential to form five separate junc- 
tions. We use a concentration of 1.56 x 10~^ molecules l~^ 
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FIG. 6: (Colour online) Snapshots illustrating four stages in the process of displacement, (a) A third strand binds to a 
misbonded pair, (b) The third strand is prevented from forming a complete duplex by the misbond. (c) Thermal fluctuations 
cause bonds in the misbonded structure to break and be replaced by the correct duplex, (d) The misbonded strand is displaced 
and the correct duplex is formed. 



1 . CTA ACT C // AA TGC CTT CTG G A 

2. CGCATGAGCAGGA//GAGTTAG 

3. TGT TCC G // TC CTG CTC ATC GC 

4. TCC AG A AGG CAT T // CG GAA CA 



a) 



b) . 
I 
I 
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FIG. 7: (Colour online) A schematic diagram showing the 
sequences of the strands used in our HoUiday junction simu- 
lations, and the alternative bound states that are possible: (a) 
the square planar configuration and (b) the x-stacked form. 



(which corresponds to 1.04 x 10 ^M). The results are dis- 
played in Figure 8. 

The results are as expected for the bonding of the 
longer arms (which we now describe as 'a-bonding'). The 
yield again displays the characteristic non-monotonic de- 
pendence on temperature. We obtain very few com- 
plete junctions, however, which is due to two effects. 
Firstly, each simulation is performed at constant temper- 
ature, which means the hierarchical route to assembly is 
less favoured than when the system is cooled, as in the 
experiments.^"'^* When the system is gradually cooled, 
Figure 8 suggests that at around T^nia) = 0.111 we would 
expect to find a region in which only a-bonded dimers 
were stable with respect to ssDNA. If the cooling was 
sufficiently slow on the timescale of bonding, all strands 
would form a-structures at around T^(a). At lower tem- 
peratures, when the HoUiday junction becomes stable 
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FIG. 8: (Colour online) A comparison of the kinetics and ther- 
modynamics for a system of 20 strands that can potentially 
form five HoUiday junctions, where the MC simulations are 
initiated from a purely single-stranded configuration. The 
MC results (lines with data points) are the final yield of 
HoUiday junctions, and the fraction of strands involved in a 
correctly-formed long (a) or short (/3) arm, or in misbonding, 
as labelled. The results are averages over five runs of length 
10® steps per strand. For comparison, the equilibrium proba- 
bilities of being in a a-bonded dimer and a HoUiday junction 
are also plotted, along with the equilibrium probability of be- 
ing in a /3-bonded dimer if the longer arms are not allowed to 
hybridize. 



with respect to the a-bonded dimers, many competing 
minima would then be inaccessible to the system as they 
would require the disassociation of stable a-bonded pairs. 
The free energy landscape of two a-structures forming a 
HoUiday junction is consequentially much simpler than 
that of four single strands forming a junction at a given 
temperature. Therefore, one expects the yield for self- 
assembly at constant temperature to be lower than when 
the system is cooled, because there is only a relatively 
narrow temperature window between where the HoUi- 
day junction becomes stable and misbonded configura- 
tions start to appear. Indeed, the shorter arms are only 
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FIG. 9; (Colour online) Snapshot showing five HoUiday junc- 
tions formed at T = 0.0842 after 5.67 x 10** MC steps per 
strand. Again, the backbone colour indicates strand type (1: 
red, 2: grey, 3: orange, 4: yellow) where numbers refer to 
Figure 7 



marginally more stable than some competing minima, as 
evidenced by the rise in misbonded structures in Fig. 8 at 
temperature just below where /3-bonded structures first 
appear. 

Secondly, even in the temperature range where HoUi- 
day junction formation is not hindered by the formation 
of misbonded configurations, the yield is low because the 
Metropolis MC algorithm artificially reduces the diffu- 
sion of bound pairs, and hence the likelihood that two 
pairs of a-bonded strands come together to form a junc- 
tion is also reduced. This is because the acceptance prob- 
ability of trial moves for bonded strands is much lower 
than for isolated strands^^, due to the energy penalty 
associated with trying to move a bound pair apart. 

Interestingly, examination of the equilibrium lines in 
Figure 8 shows that the HoUiday junctions are actu- 
ally stable at a higher temperature than the individual 
shorter arms. This is because the total loss of entropy 
when two a-bonded dimers bind together is consider- 
ably less than that for two short arms in isolation (as 
fewer translational degrees of freedom are lost), whereas 
the energy change is comparable. Thus, there is a small 
temperature window at T* w 0.1 where hierarchical as- 
sembly can occur at constant temperature as the short 
arms are only stable once a-bonding has taken place. 
However, due to the deficiencies in the MC simulations 
mentioned above, the yield of HoUiday junctions in this 
region is practically zero. Instead, the maximum yield of 
HoUiday junctions occurs at lower temperatures where 
non-hierarchical pathways that proceed by the addition 
of single strands become feasible. 

The above simulations were only able to successfully 
model the first stage of the HoUiday junction assembly. 
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FIG. 10: (Colour online) The yields of HoUiday junctions (HJ) 
and misbonded configurations for MC simulations, where the 
initial configuration was a pair of a-bonded dimers. For com- 
parison, the equilibrium probabilities of being in a a-bonded 
dimer and a HoUiday junction are also plotted. The results 
are averages over five runs of length 7.5 x 10* steps per strand. 



namely the formation of a-bonded dimers. To probe 
the second stage of assembly, we must first make two 
modifications to our simulation approach to overcome 
the two deficiencies mentioned above. Firstly, we study 
systems initially consisting of pairs of a-bonded strands, 
which we assume have successfully formed at some higher 
temperature — this is reasonable given the results of our 
earlier simulations. Secondly, we also include simple local 
cluster moves in addition to those which move only one 
strand, i.e. translations, rotations and bending of pairs of 
a-bonded strands. With these changes incorporated, we 
simulate the same system for 7.5 x 10* steps per strand 
at a range of temperatures below Tj^ia). It should be 
noted that due to a change in the size of typical moves, 
one move per strand now corresponds to approximately 
lOps. 

We find that HoUiday junctions form over a wide range 
of intermediate temperatures, whilst kinetic traps at 
low temperature lead to incomplete bonding and con- 
sequently to the possibility of forming large clusters. A 
typical result from the high-yield regime is shown in Fig- 
ure 9. As fully-bonded HoUiday junctions are essentially 
inert, it is reasonable to analyse their assembly behaviour 
by considering only one junction. The smaller system size 
has the effect of increasing the assembly rate, because the 
strands have less distance to diffuse, but does not affect 
the basic assembly mechanism. We therefore simulated 
systems consisting of two a-bonded pairs with the same 
concentration as above. 

We also introduced some modifications to the the um- 
brella sampling scheme in order to more efficiently com- 
pute the thermodynamics of the second-stage of HoUi- 
day junction formation. As well as cluster moves, we 
also introduced a 'tethering' component in the weighting 
function W{Q). We introduce a length Tmin that cor- 



responds to the shortest distance between any pair of 
backbone sites on different strands. We then spUt Q = 



(a) 



into two regions: we weight those states with 



< 31 



with W = 1 but for 



> 3/ we use W^ = 0.1. This en- 



ables us to increase the rate of transitions between Q — 
and 1, and reduces the time spent simply simulating the 
diffusion of a-bondcd dimcrs waiting for a collision to 
occur. 

The MC results are plotted in Figure 10 along with 
the equilibrium results obtained from umbrella sampling. 
With the cluster moves in place, we now see a high yield 
of Holliday junctions and a broad maximum in the yield 
as a function of temperature. The hierarchical pathway 
has the effect suggested earlier. Namely, the temperature 
window over which correct formation can occur is vastly 
increased, as the most significant competing minima arc 
inaccessible because their formation would require disso- 
ciation of the a-bonded pairs. 

The model is therefore consistent with the experimen- 
tally observed hierarchical assembly of Holliday junctions 
as the system is cooled. ^'■^^ It should be noted, how- 
ever, that the junctions in our model usually form in 
the 'square planar' as opposed to the 'x-stacked' shape 
(Figure 7) that is observed under normal experimental 
conditions. The preference for one structure is a subtle 
consequence of the concentration of cations and the pre- 
cise helical geometry of DNA.^^ This level of detail is not 
included in our coarse-grained model, so it is not surpris- 
ing that it cannot reproduce the preference for ^-stacked 
structures. Moreover, it is relatively easy to see why in 
our model, which forms 'ladders' rather than helices, a 
planar geometry is preferred for the junction. 

Some of the equilibrium thermodynamic properties as- 
sociated with the formation of a Holliday junction are 
shown in Figure 11. In particular. Fig. 11(b) shows the 
free energy profile for the formation of a Holliday junc- 
tion from two a-bondcd pairs. The initial peak and sub- 
sequent drop is very similar to that for the duplexes and 
can be accounted for in the same way. However, the for- 
mation of the two arms is not like the zipping up of a 
14-base duplex, because there is much more relative free- 
dom of movement for the bases on either side of the a- 
bonded sections in the dimers than for consecutive bases 
on single-stranded DNA. Thus, there is a rise between 
M = 7 and M = 8 that is a result of the entropy penalty 
of bringing together the two ends to make the second 
short arm. We note that the penalty is much smaller 
than the initial cost of bringing the two a-bonded pairs 
together, and as a result, the value of Tm for the junction 
is higher than for the short arms in isolation, as noted 
earlier. 

An interesting feature of Figure 11(b) is the plateau 
between Q — 6 and Q = 7. In general, when two a- 
bonded pairs meet to form one short arm, there is an en- 
tropic penalty associated with the excluded volume that 
the remaining bases in the a-structures represent to each 
other. This excluded volume is a large fraction of the 
total available space if one complete short arm is formed. 
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FIG. 11: Thermodynamics for the formation of a single Hol- 
liday junction, (a) Heat capacity curves for two a-bonded 
dimers forming a Holliday junction (HJ) and four strands 
forming two a-bonded dimers, as labelled, (b) Free energy 
profiles for the formation of a Holliday junction from two a- 
bonded dimers at different temperatures, as labelled. Q rep- 
resents the total number of correct bonds in the short arms 
of the junction. 



so that there are no free monomers between the short 
arm and the a-bonded sections. As a consequence, there 
is not the usual free energy benefit from forming the final 
bond in the short arm (the one closest to the centre of 
the Holliday junction), as the excluded volume penalty is 
large and those states that are allowed involve distortion 
of the backbones and bonds near the centre of the Hol- 
liday junction. Although the details of this free energy 
penalty and the other features in Fig. 11(b) will depend 
on the exact geometry of the system, we expect the cal- 
culated free energy profile to be representative of that for 
real DNA. 

It is possible to extend the two-state model discussed 
in Section HI A to the formation of a Holliday junc- 
tion by considering the concentrations of all four isolated 
strands, the two a-bonded intermediates and the junc- 
tion itself. We assume Eq. (9) holds for every possible 
transition, use the same thermodynamic parameters as 
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FIG. 12: (Colour online) Bulk equilibrium probability of 
strands being in a HoUiday junction (HJ) or an a-bonded 
dimer computed by umbrella sampling (solid lines) and by 
the extended two-state model (dashed lines), where lines (a) 
and (b) represent the two possible a-bonded dimers. 



before and apply conservation of total strand number. 
To estimate Ai^o and ASq associated with the forma- 
tion of a HoUiday junction, wc construct a single strand 
by linking the ends of the oligonucleotides together with 
four non-bonding bases. The thermodynamic parame- 
ters associated with the folding of this structure are pre- 
dicted by "UNAFold".^^ The correction for the fact that 
our strands are not connected by loops is discussed by 
Zuker.^^ This leaves five simultaneous equations (assum- 
ing perfect stochiometry) which can be solved numeri- 
cally. 

Figure 12 compares this extended two-state model 
(ETSM) with the bulk thermodynamics predicted by um- 
brella sampling (using the same temperature scaling as 
before). ETSM predictions for both stages of HoUiday 
junction formation agree well with our results, which 
again supports our hypothesis that much of the physics 
of self-assembly can be reproduced by a simple coarse- 
grained model. The extra width of the transitions in our 
model occurs for the same reasons as mentioned in Sec- 
tion HI A when discussing Fig. 5. 
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FIG. 13: (Colour online) Simulation results for the badly- 
designed HoUiday junction of Section IIIC, where the ini- 
tial configuration was a pair of a-bonded dimers. The lines 
with data points give the yield of correctly-formed junctions 
and misbonded configurations, as labelled. For comparison 
the solid line gives the equilibrium probability that the well- 
designed sequences of Section IIIB adopt a HoUiday junction. 




FIG. 14: (Colour online) Example of a competing minimum 
for a badly designed HoUiday junction. The snapshot is taken 
from a simulation at T* = 0.0936. 



C. Negative Design 



The hierarchical pathway for the formation of the Hol- 
liday junction is one aspect of the sequence design that 
aids the formation of the correct structure. The experi- 
mental base ordering of the HoUiday junction, however, 
was also chosen to minimize the number of competing 
structures — a typical example of 'negative design'. ^° We 
illustrate the importance of such negative design by con- 
sidering a badly-designed junction, where the comple- 
mentary seven-base sections consist of just one base type 
each. We simulate a system of two a-bonded pairs with 



the short arms so modified under the same conditions as 
for Fig. 10 and the results are shown in Figure 13. Al- 
though there is some probability of forming the correct 
junction, the simulations are dominated by misbonded 
junctions, such as the one depicted in Figure 14. Al- 
though these competing structures are energetically less 
stable than the target junction because of the presence 
of unpaired bases at the 'dangling' ends, they are read- 
ily accessible, because the likelihood that the first bonds 
formed between two a-bonded pairs are in the same reg- 
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istry as the target structure is low. The yield of the 
correct Holliday junctions will then depend upon how 
readily the system is able escape from these malformed 
junctions. Clearly, this process is slow on the time scales 
of the current simulations, and is also likely to hinder the 
location of the target structure in experiment. 



IV. DISCUSSION 

In this paper we have introduced a simple coarse- 
grained model of DNA in order to test the feasibility 
of modeling the self-assembly of DNA nanostructure by 
Monte Carlo simulations. Any such model involves a 
trade-off between detail and computational simplicity, 
and here we deliberately chose to keep the model as 
simple as possible in order to give us the best chance 
of being able to probe the time scales relevant to self- 
assembly. The model involves just two interaction sites 
per nucleotide. 

The results from our model are very encouraging. 
Firstly, we have shown that using our model it is fea- 
sible to model the self-assembly of both DNA duplexes 
and a Holliday junction. The latter represents, to the 
best of our knowledge, the first example of the simula- 
tion of the self-assembly of a DNA structure beyond a du- 
plex. Secondly, the model succeeds in reproducing many 
of the known thermodynamic and dynamic features of 
this self-assembly. For example, the equilibrium melting 
curves agree well with those predicted by the nearest- 
neighbour two-state model, ^^ which is known to predict 
melting temperatures very accurately. The model is also 
able to capture important dynamical phenomena such as 
displacement. 

Thirdly, by analysing the thermodynamic and dynamic 
constraints on assembly, we have been able to gain some 
important physical insights into the nature of DNA self- 
assembly and how to control it. For example, the opti- 
mal conditions for self-assembly are in the temperature 
range just below the melting temperature of the the tar- 
get structure, where this structure is the only one sta- 
ble with respect to the precursors, be they ssDNA or 
some intermediate in a hierarchical assembly pathway. 
At lower temperatures, misbonded configurations can be 
formed that act as kinetic traps and reduce the assem- 
bly yield. Similar trade-offs between the thermodynamic 
driving force and kinetic accessibility have been previ- 
ously seen in a variety of self-assembling systems, ''^"^^ 
and also give rise to a maximum in the yield near to 
and below the temperature at which the target structure 
becomes stable. 

We have also seen how hierarchical self-assembly 
through cooling can be a particularly useful strategy to 
aid self-assembly, because the formation of stable inter- 
mediates at higher temperatures simplifies the free en- 
ergy landscape for the assembly of the next stage in the 
hierarchy by reducing the number of misbonded config- 
urations available to the system. This simplification of 



the energy landscape is likely to be a general feature of 
hierarchical self-assembly. 

Thus, our results have confirmed the utility of using 
coarse-grained DNA models to study the self-assembly 
of DNA nanostructures, and supported our hypothesis 
that much of the physics can be explained by describing 
DNA as a semi-flexible polymer with selective attractive 
interactions. The model's success in forming junctions 
in reasonable computational time suggests that it will be 
possible to develop further models that have an increased 
level of detail, but which can still access the time scales 
relevant to self-assembly. 

The model has also highlighted some features which it 
would be advantageous to include in such models. For ex- 
ample, greater accuracy in the details of oligonucleotide 
geometry, particularly the helicity of dsDNA, would allow 
features such as the characteristically long persistence 
length of hybridized strands to be reproduced and give 
the appropriate degree of rigidity to simulated nanostruc- 
tures. Such improvement might also allow more compli- 
cated motifs to be accounted for, such as the preference 
for x-stacked Holliday junctions that the current model 
could not reproduce. 

It should be noted that if one is to introduce helicity 
in a physically reasonable way it should also allow for ss- 
DNA to undergo a stacking transition to a helical form. 
This transition may play a significant role in the thermo- 
dynamics and kinetics of self-assembly.®^ Previously pro- 
posed coarse-grained DNA models that incorporate helic- 
ity have not been designed to accurately reproduce this 
feature. Incorporating extra degrees of freedom which 
are relevant to the stacking transition, such as the rota- 
tion of the base with respect to the sugar-base bond, may 
also help to increase the entropy change on hybridization 
and hence make the transition narrower as required. 

The approximation to diffusive dynamics provided by 
the local move Metropolis Monte Carlo algorithm could 
also be improved. Currently the 'local' moves involve 
displacing, rotating or bending entire strands or pairs 
of strands — these effectively constitute cluster moves of 
groups of strongly bound nucleotides, and result in slow 
relaxation and translation times within bound structures. 
More realistic dynamics may be achievable by considering 
trial moves of individual nucleotides, and incorporating 
cluster moves in a more systematic fashion, such as in 
the 'virtual move' MC algorithm proposed by Whitelam 
and Geissler.^^'®^ 

One potential issue with any coarse-graining is how 
it preserves the different time scales in a system. In 
Section II B we assigned an approximate mapping be- 
tween the number of Monte Carlo steps and physical time 
based upon comparison of diffusion coefficients. There 
are, however, other important time scales in the system, 
such as the time scale for the internal dynamics of an iso- 
lated strand and the time scale over which the 'zipping- 
up' of two strands occurs after a bond has been formed. 
Comparisons of experimental diffusion coefficients^'* and 
melting and bubble formation from molecular dynamics 
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simulations'^''''^^ suggest a large separation in time scale 
between diffusion-limited processes and those that rely 
on the dynamics of individual nucleotides. Encourag- 
ingly, we observe a similar time scale separation in our 
model: zipping-up and thermal relaxation of isolated 
strands occur over times scales shorter than 10^ steps 
per strand, whereas association typically required on the 
order of 10^ to 10® steps per strand near the melting 
temperature (corresponding to tens or hundreds of mi- 
croseconds). Furthermore, we would argue that it is this 
time scale separation, and not the precise ratios of the 
relevant rate constants, that it is important to reproduce 
in self-assembly simulations. 

We should also note that the mapping of the diffu- 
sion constants between the model and experiment will 
not necessarily ensure that the rate of association is ac- 
curate in our model, because although the frequency of 
collisions in our model should be correct, there is also the 



contribution to the association rate from the probability 
that a collision will lead to successful association. That 
we can reproduce the thermodynamics of the DNA melt- 
ing transitions implies that the rates of association and 
disassociation have the right ratio, but not that they nec- 
essarily have the correct absolute value. For example, it 
is conceivable that hclicity (both in dsDNA and possibly 
in ssDNA), which is not included in the current model, 
will influence the likelihood that a collision is successful. 
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